-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[2/3] queue-based autoscaling - add default queue-based autoscaling policy #59548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new queue-based autoscaling policy, which is a great addition for TaskConsumer deployments. The implementation is well-structured, with a dedicated QueueMonitor actor and comprehensive unit tests. I've identified a critical bug in the Redis connection handling and a high-severity logic issue in the scaling-to-zero implementation. Addressing these will ensure the new feature is robust and behaves as expected.
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
Signed-off-by: harshit <[email protected]>
86223e0 to
26d4bd7
Compare
Signed-off-by: harshit <[email protected]>
Signed-off-by: harshit <[email protected]>
29f8b0b to
0c7cf30
Compare
abrarsheikh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would base this PR on top of #58857
| DEFAULT_COMBINED_WORKLOAD_AUTOSCALING_POLICY = ( | ||
| "ray.serve.autoscaling_policy:default_combined_workload_autoscaling_policy" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be named better, combined_workload is overloaded term
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abrarsheikh how about DEFAULT_QUEUE_AWARE_AUTOSCALING_POLICY? it seems okaay to me and not that overloaded, lmk your view on it, will then amend the code accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would choose default_async_inferance_autoscaling_policy .
default_combined_workload_autoscaling_policy: to the reader its not clear what workloads we are combining
DEFAULT_QUEUE_AWARE_AUTOSCALING_POLICY: which queue, the other autoscaling policy is also queue aware, but the queue there refers to ongoing requests.
| @@ -1,12 +1,15 @@ | |||
| import logging | |||
| import math | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
many of the changes in this file are overlapping with changes in this PR #58857. I suggest reviewing the other PR, makes sure its compatible with what we need to do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, will review the other PR
| queue_length = ray.get( | ||
| queue_monitor_actor.get_queue_length.remote(), timeout=5.0 | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would make 1 RFC call per every controller iteration. What is the impact of this on the scalability of the controller? Do we need to query queue length in every iteration?
|
|
||
|
|
||
| @pytest.fixture | ||
| def queue_monitor_mock(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i woudl avoid such heavy mocking. instead figure out a way to write the tests using dependency injection. study how the existing tests are written in test_deployment_state
i am not sure of the timeline we are targeting for #58857 PR, but since we want to get queue-based autoscaling feature out asap, hence, thought of merging this PR as it is. and then once the #58857 PR is merged, i will create a new one, using the changes of #58857 to refactor the queue-aware autoscaling policy. @abrarsheikh lmk your thoughts on it. |
Summary
This PR adds a new
combined_workload_autoscaling_policyfunction that enables Ray Serve deployments to scale based on combined workload from both message queue depth and HTTP requests. This is the second part of the queue-based autoscaling feature for TaskConsumer deployments.Related PRs:
Changes
New Components
combined_workload_autoscaling_policy()autoscaling_policy.py_apply_scaling_decision_smoothing()autoscaling_policy.pyget_queue_monitor_actor_name()queue_monitor.pyDEFAULT_COMBINED_WORKLOAD_AUTOSCALING_POLICYconstants.pyFiles Modified
python/ray/serve/autoscaling_policy.py- Added new policy and helper functionspython/ray/serve/_private/constants.py- Added policy constantpython/ray/serve/_private/queue_monitor.py- Addedget_queue_monitor_actor_name()helperFiles Added
python/ray/serve/tests/unit/test_queue_autoscaling_policy.py- 22 unit testsHow It Works
Scaling Formula
Example:
🤖 Generated with https://claude.com/claude-code